Chapter 3 Interpolation

In this tutorial, you will learn how to use R to apply different interpolation methods to your data sets. We will use couple of R packages that are familiar to us, like sf and mapview but also some new ones. Most notably, we will use fields (Douglas Nychka et al. 2021) for splines and gstat (Gräler, Pebesma, and Heuvelink 2016) as well as automap (Hiemstra et al. 2008) for distance weighting.

library(automap)
library(ggplot2)
library(sf)
library(sp)
library(dplyr)
library(fields)
library(gstat)
library(tmap)
library(mapview)

3.1 The data

We will use the LUCAS soil data. LUCAS is an acronym for Land Use/Land Cover Area Frame Survey. This data base contains approximately 20.000 soil samples from all EU27 countries with the exceptions of Romania and Bulgaria. You can download a subset of the LUCAS data base I created for this course here.

Next, we load the LUCAS data.

lucas <- readRDS("data/lucas_saxony.rds")

As always, we first inspect the new object.

class(lucas)
## [1] "sf"         "data.frame"

The glimpse() function from dlyr is similar to the str() from base R.

glimpse(lucas)
## Rows: 79
## Columns: 23
## $ Point_ID <int> 44743044, 44763132, 44763142, 44883126, 44923048, 44923066, 44923084, 44923152, 44963130, 44983054, 44983158, 45003106…
## $ Coarse   <int> 34, NA, NA, NA, 23, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Clay     <int> 22, NA, NA, NA, 22, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Sand     <int> 26, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ Silt     <int> 52, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ pH_CaCl2 <dbl> 5.4, 6.5, 6.6, 7.3, 5.3, 6.5, 6.0, 6.8, 4.5, 5.9, 5.8, 5.9, 6.0, 5.5, 4.4, 6.1, 5.2, 4.1, 3.7, 4.7, 6.9, 3.5, 4.1, 6.4…
## $ pH_H20   <dbl> 5.77, 6.55, 6.82, 7.70, 5.53, 6.83, 6.14, 7.00, 4.66, 5.94, 6.10, 5.96, 6.18, 5.80, 4.57, 6.37, 5.70, 4.23, 4.34, 4.85…
## $ EC       <dbl> 8.52, 22.00, 17.34, 24.90, 38.90, 9.32, 10.88, 20.50, 46.00, 50.50, 15.48, 122.00, 54.10, 59.00, 8.90, 5.80, 26.80, 23…
## $ OC       <dbl> 36.3, 19.9, 15.7, 20.6, 34.1, 17.2, 18.1, 11.1, 97.8, 31.8, 10.2, 19.0, 60.1, 22.0, 40.3, 12.6, 23.3, 29.2, 162.2, 35.…
## $ CaCO3    <int> 0, 2, 0, 16, 0, 1, 0, 2, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 2, 1, 2, 0, 2, 1, 0, 0, 0, …
## $ P        <dbl> 60.0, 48.2, 67.7, 4.6, 45.0, 44.1, 49.2, 33.7, 30.6, 18.6, 51.8, 96.1, 64.7, 57.3, 22.2, 25.1, 37.0, 27.6, 72.7, 36.7,…
## $ N        <dbl> 4.1, 2.1, 1.8, 0.9, 4.3, 1.9, 1.9, 1.4, 5.7, 3.5, 1.2, 2.6, 6.3, 2.6, 2.1, 1.1, 2.7, 2.2, 10.1, 3.4, 2.8, 10.9, 3.2, 0…
## $ K        <dbl> 103.4, 192.1, 265.8, 90.1, 328.2, 123.8, 197.9, 149.4, 77.7, 91.1, 162.4, 724.9, 1752.9, 81.5, 38.0, 115.5, 89.9, 119.…
## $ LC       <chr> "E10", "B11", "B11", "C10", "E20", "B13", "B55", "B11", "C10", "E20", "B11", "B11", "E20", "C10", "C32", "B14", "C32",…
## $ LU       <chr> "U111", "U111", "U111", "U120", "U111", "U111", "U111", "U111", "U120", "U111", "U111", "U111", "U111", "U420", "U120"…
## $ NUTS_0   <chr> "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE"…
## $ NUTS_1   <chr> "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED", "DED",…
## $ NUTS_2   <chr> "DED4", "DED5", "DED5", "DED5", "DED4", "DED4", "DED4", "DED5", "DED5", "DED4", "DED5", "DED5", "DED4", "DED5", "DED5"…
## $ NUTS_3   <chr> "DED44", "DED52", "DED53", "DED52", "DED44", "DED45", "DED45", "DED53", "DED52", "DED45", "DED53", "DED52", "DED45", "…
## $ LC0_Desc <chr> "Grassland", "Cropland", "Cropland", "Woodland", "Grassland", "Cropland", "Cropland", "Cropland", "Woodland", "Grassla…
## $ LC1_Desc <chr> "Grassland with sparse tree/shrub cover", "Common wheat", "Common wheat", "Broadleaved woodland", "Grassland without t…
## $ LU1_Desc <chr> "Agriculture (excluding fallow land and kitchen gardens)", "Agriculture (excluding fallow land and kitchen gardens)", …
## $ geometry <POINT [°]> POINT (12.15614 50.48765), POINT (12.22172 51.27776), POINT (12.22607 51.3676), POINT (12.39081 51.22046), POINT…

We use the mapview package to display the data on an interactive map.

mapview(lucas)

We will need a layer of the the German federal state saxony to prepare maps later on. We get it from the Database of Global Administrative Areas (GADM). We have used the GADM data for Germany before (gadm36_DEU_3_pk.gpkg). You can download them here. You can always query other countries or resolutions by following this tutorial.

saxony <- 
        st_read("data/gadm36_DEU_3_pk.gpkg") |> 
        st_as_sf() |> 
        filter(NAME_1 == "Sachsen")
## Reading layer `gadm36_DEU_3_pk' from data source `C:\Users\jonat\Documents\Uni\teaching\GIS\gisbook2\data\gadm36_DEU_3_pk.gpkg' using driver `GPKG'
## Simple feature collection with 4680 features and 16 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 5.866251 ymin: 47.27012 xmax: 15.04181 ymax: 55.05653
## Geodetic CRS:  WGS 84
mapview(saxony)

3.2 Proximity Polygons

We start out with proximity polygons. We can create Voronoi polygons around our points with st_voronoi().

#- st_voroni works better with projected coordinate reference systems 
voroni <- st_voronoi(lucas)
## Warning in st_voronoi.sfc(st_geometry(x), st_sfc(envelope), dTolerance, : st_voronoi does not correctly triangulate longitude/latitude
## data

The warning messages notifies us that it its advisable to use projected coordinates, thus we transform our data from WGS84 to Lamber Azimuthal Eqal Area.

lucas   %<>% st_transform(3035)
saxony %<>% st_transform(3035)

With the following code we create Voronoi polygons around the observations in lucas and then keep only those areas of the polygons that intersect with the saxony polygon. As you can see we need more functions than just st_voronoi() to do this. With st_union() we combine the single POINT objects in lucas to a single MULTIPOINT object. With st_voronoi() we can now create the Voronoi polygons which are returned in the GEOMERTRYCOLLECTION. To extract the polygons from this collection, we can use the st_collection_extract() function. Its is well worth your time to execute the steps one by one and to check out the intermediate products.

voroni <- 
        lucas |> 
        st_union() |> 
        st_voronoi() |> 
        st_collection_extract() |> 
        st_intersection(y= saxony) |> 
        st_as_sf()

Next, we add the electrical conductivity (EC, our focal variable for this example) to the respective polygons. We assign a point to each polygon with the st_nearest_feature() function.

id <- st_nearest_feature(voroni, lucas)
voroni$EC <- lucas$EC[id]

Now we have the final product also shown in the lecture:

mapview(voroni, zcol = "EC") + mapview(lucas, zcol = "EC", legend = F)

3.3 Trend Surface Analysis

Trend surface analysis (TSA) is a multiple regression in which EC is explained by spatial coordinates. We can use the base R regression function lm() for TSA. To prepare the TSA, we need to extract the coordinates from the geometry column of lucas. For this we use the st_coordinates() function.

coords  <- st_coordinates(lucas)
x.coord <- coords[,1]
y.coord <- coords[,2]
lucas <- mutate(lucas, 
                x = x.coord, 
                y = y.coord
                )

Then we conduct a simple linear regression and also a polynomial regression with polynomial of second degree.

tsa1 <- lm(EC ~ x+y, data = lucas)
tsa2 <- lm(EC ~ polym(x, y, degree=2), data = lucas)

To predict the values of unobserved locations with the TSA we need the coordinates of these locations. Here, I show to different approaches: i) randomly distributed point in Saxony and ii) regularly spaced points.

The randomly distributed points are created with st_sample(). The first argument to this function is the bounding box, i.e., the area in which the point can lie. The second argument is the number of points. Here, we create 300 new points in Saxony.

random_points <- st_sample(saxony, 300)
mapview(random_points)

Again, we have to extract the coordinates from the geometry column with st_coordinates(). Currently the random_points object is still of class sfc. This means it is only a sf column and not a table to which we can add columns. To create a sf table, we first need to use the st_as_sf() function. The geometry columns of sf objects are typically called geom or geometry. However, after calling st_as_sf() the column is called "x". We rename it with the rename() function available in the dplyr package. Laslty, we add the coordiantes.

random_points <- 
        random_points |> 
        st_as_sf() |>
        rename(geom = x) |> 
        mutate(x = st_coordinates(random_points)[,1], 
               y = st_coordinates(random_points)[,2])

We predict the EC for these coordinates with predict()

random_points$EC_tsa1 <- predict(tsa1, random_points)
random_points$EC_tsa2 <- predict(tsa2, random_points)

Let’s have a look at the results. On the following map circles are true observaions and diamonds are predicted values.

breaks = c(0, 10, 15,20, 25, 30, 35, 40, 170)
tmap_mode("plot")
## tmap mode set to plotting
tm_shape(saxony) + 
        tm_polygons() + 
        tm_shape(random_points) + 
        tm_dots(col = "EC_tsa1", shape = 23, size = .5, breaks = breaks) + 
        tm_shape(lucas) + 
        tm_dots(col = "EC", shape = 21, size = .5, breaks = breaks) + 
        tm_layout(legend.outside = TRUE)

tm_shape(saxony) +
        tm_polygons() + 
        tm_shape(random_points) + 
        tm_dots(col = "EC_tsa2", shape = 23, size = .5, breaks = breaks, midpoint = NA) + 
        tm_layout(legend.outside = TRUE) + 
        tm_shape(lucas) + 
         tm_dots(col = "EC", shape = 21, size = .5, breaks = breaks) 

Now to the regularly spaced points. First we need to get the bounding box of saxony. We can get the bounding box with st_bbox().

saxony_bbox <- st_bbox(saxony)
saxony_bbox
##    xmin    ymin    xmax    ymax 
## 4453658 3009238 4672526 3178781

With st_as_sfc() we can create a polygon from the bounding box.

saxony_bbox_sfc <- st_as_sfc(saxony_bbox)
mapview(saxony_bbox_sfc) + mapview(saxony)

With st_make_grid() we can create a grid within the bounding box. In addition to the bounding box, we provide the function with the envisioned cell size. Our data have a projected coordinate reference system (LAEA), therefore the cell size is given in meters. Here we use square cells with a side length of 10 km.

saxony_grid <- st_make_grid(
        x = saxony_bbox_sfc, 
        cellsize = c(1e4,1e4),
        what = "polygons"
) |> 
        st_as_sf()

mapview(saxony_grid)
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a multiple of vector length (arg 1)

Now we need to crop the grid to saxony.

saxony_grid %<>% st_intersection(st_union(saxony))
mapview(saxony_grid)
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a multiple of vector length (arg 1)

Now we have several polygons but not points. Since predictions are made for a singe pair of x and y coordinates and the squares have four pairs we need to extract points from the polygons. There are two possibilities: the corners of the polygons or their centroids.
Both can be created with the st_make_grid() function.

saxony_grid_corner <- 
        st_make_grid(
                x = saxony_bbox_sfc, 
                cellsize = c(1e4,1e4),
                what = "corners") |> 
        st_as_sf() |> 
        st_intersection(saxony)
## Warning: attribute variables are assumed to be spatially constant throughout all geometries
map1 <- mapview(saxony_grid) + saxony_grid_corner
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a multiple of vector length (arg 1)
saxony_grid_centroid <- 
        st_make_grid(
        x = saxony_bbox_sfc, 
        cellsize = c(1e4,1e4),
        what = "centers") |> 
        st_as_sf() |> 
        st_intersection(saxony)
## Warning: attribute variables are assumed to be spatially constant throughout all geometries
map2 <- mapview(saxony_grid) + saxony_grid_centroid
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a multiple of vector length (arg 1)
map1 | map2
## Lade nötigen Namensraum: leaflet.extras2

We could also have extracted the centroids from the polygons with the following code:

saxony_grid_centroid <- 
        saxony_grid |> 
        st_centroid() |> 
        st_as_sf() |> 
        rename(geom = x)

Now we can make predictions for the centroids and transfer them back to the grid polygons for visualization.

saxony_grid_centroid %<>% 
        mutate(
                x = st_coordinates(saxony_grid_centroid)[,1], 
                y = st_coordinates(saxony_grid_centroid)[,2]
        )

saxony_grid_centroid$EC_tsa_1 <- predict(tsa1, saxony_grid_centroid)
saxony_grid_centroid$EC_tsa_2 <- predict(tsa2, saxony_grid_centroid)

saxony_grid$EC_tsa1 <- saxony_grid_centroid$EC_tsa_1
saxony_grid$EC_tsa2 <- saxony_grid_centroid$EC_tsa_2

mapview(saxony_grid, zcol = "EC_tsa1") + 
        mapview(lucas, zcol = "EC", legend = F)
mapview(saxony_grid, zcol = "EC_tsa2") + 
        mapview(lucas, zcol = "EC", legend = F)

3.4 Splines

Next, we have a quick look at interpolation with splines. For this we will need the fields package. We use a sort of spline that is called thin plate spline which can be easily called through the Tps() function. As arguments we provide the coordinates as a matrix and the independent variable (EC).

tps_fit <- Tps(x = matrix(c(lucas$x, lucas$y), ncol = 2), Y = lucas$EC)

Again, we can use the predict() function to predict the electrical conductivity for unobserved locations. The result is a matrix which we transform to a numeric vector with as.numeric().

tps_prediction <- predict(tps_fit, st_drop_geometry(saxony_grid_centroid[, c("x", "y")]))
saxony_grid$EC_tps <- as.numeric(tps_prediction)
mapview(saxony_grid, zcol = "EC_tps") + 
        mapview(lucas, zcol = "EC", legend = F)

3.5 Weighted average

In the rest of the script we cover methods that weight the observed values based on their distance to the predicted location. One method we did not discuss in the lecture is k nearest neighbors. With this method we consider the k nearest points and build their mean value. These k points are all weighted equally. All other points are not considered. For all the weighted average procedures we will use the gstat package. With the epinomous function we create a model. As in lm() regression models, we start with the formula. ~1 indicates that we do not wish to use any explanatory variables but instead estimate the mean value of the dependent variable. With locations we choose the data set that gives the spatial coordinates, nmax gives the maximal number of points to use in any one prediction (k) and idp is the inverse distance parameter. It determines how harshly distanced points are down weighted. Here we choose the five nearest neighbors and no distance weighting, i.e., idp = 0.

knn_mod <- gstat(formula=EC~1, locations=lucas, nmax=5, set=list(idp = 0))
knn_pred <- predict(knn_mod, saxony_grid)
## [inverse distance weighted interpolation]
mapview(knn_pred, zcol = "var1.pred") + mapview(lucas, zcol = "EC", legend = F)

For inverse distance weighting (IDW) the model construction and prediction are packaged into a single function.

idw1 <- idw(EC ~ 1, locations = lucas, newdata = saxony_grid_centroid, idp = 2)
## [inverse distance weighted interpolation]
idw2 <- idw(EC ~ 1, locations = lucas, newdata = saxony_grid_centroid, idp = 3)
## [inverse distance weighted interpolation]
saxony_grid$EC_idw_1 <- idw1$var1.pred
saxony_grid$EC_idw_2 <- idw2$var1.pred
mapview(saxony_grid, zcol = "EC_idw_1") + mapview(lucas, zcol = "EC", legend = F)
mapview(saxony_grid, zcol = "EC_idw_2") + mapview(lucas, zcol = "EC", legend = F)

3.6 Kriging

First we compute the empirical variogram. gstat is becoming increasingly compatible with sf but it was originally designed for the predecessor sp. To be on the safe side we will transform all sf objects in to sp objects. Luckily, this can be done with a single function.

lucas_sp <- as(lucas, "Spatial")

Now we can plot a variogram cloud and fit the empirical variogram with the variogram() function.

v_emp_cloud <- variogram(EC~1, lucas_sp, cloud = T)
v_emp <- variogram(EC~1, lucas_sp)

As with the idw() function we don’t assume that there is a spatial trend and construct a variogram for a constant mean value.

plot(v_emp_cloud)

plot(v_emp)

To find the best variogram model we can either fit a selection of models manually …

# spherical
v_mod_sph <- fit.variogram(
        object = v_emp, 
        model  = vgm("Sph")
)
# exponential
v_mod_exp <- fit.variogram(
        object = v_emp, 
        model  = vgm("Exp")
)
plot(v_emp, v_mod_sph)

plot(v_emp, v_mod_exp)

… or we use the automap package to automatically fit and compare multiple models.

autovar <- autofitVariogram(formula = EC ~ 1, 
                 input_data = lucas_sp)

With this variogram we can use kriging to interpolate the EC values.

kriging  <- krige(
  formula = EC~1,                       
 locations = lucas_sp, 
 newdata = saxony_grid_centroid,
  model = autovar$var_model      
  )
## [using ordinary kriging]
saxony_grid$EC_krige <- kriging$var1.pred
saxony_grid$EC_krige_variance <- kriging$var1.var

mapview(saxony_grid, zcol = "EC_krige") + 
        mapview(lucas, zcol = "EC", legend = F)

Lastly, we can look at the kriging variance wish shows our uncertainty for the predicted values. The further we are removed from measured points we larger the uncertainty.

mapview(saxony_grid, zcol = "EC_krige_variance") + 
        mapview(lucas)

References

Douglas Nychka, Reinhard Furrer, John Paige, and Stephan Sain. 2021. “Fields: Tools for Spatial Data.” Boulder, CO, USA: University Corporation for Atmospheric Research. https://github.com/dnychka/fieldsRPackage.
Gräler, Benedikt, Edzer Pebesma, and Gerard Heuvelink. 2016. “Spatio-Temporal Interpolation Using Gstat.” The R Journal 8: 204–18. https://journal.r-project.org/archive/2016/RJ-2016-014/index.html.
Hiemstra, P. H., E. J. Pebesma, C. J. W. Twenh"ofel, and G. B. M. Heuvelink. 2008. “Real-Time Automatic Interpolation of Ambient Gamma Dose Rates from the Dutch Radioactivity Monitoring Network.” Computers & Geosciences.